Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
许多NLP任务受益于使用通常具有超过1000亿参数的大语言模型(LLM)。随着Bloom-176b和Opt-175B的发布,每个人都可以下载该规模的预估计型号。尽管如此,使用这些模型仍需要许多研究人员无法获得高端硬件。在某些情况下,LLM可以通过RAM卸载或托管API更实惠。但是,这些技术具有先天的局限性:对于交互推理而言,卸载太慢,而API的灵活性不足以进行研究。在这项工作中,我们通过加入信任处理客户数据的多个政党的资源来提出花瓣$ - $ $用于推理和微调大型模型的系统。我们证明,这种策略的表现极大地超过了非常大型型号的卸载,以每秒约1美元的价格$ \ $ \ $ \ $ \ $ \ $ \ $ \ $ \ $ 1。与大多数推理API不同,花瓣还本地揭示了服务模型的隐藏状态,从而使其用户可以根据有效的微调方法训练和共享自定义模型扩展。
translated by 谷歌翻译
大型语言模型已被广泛采用,但需要大量的GPU记忆进行推理。我们为变形金刚中的进料前进和注意力投影层开发了一个INT8矩阵乘法的过程,该过程将推断所需的记忆减少了一半,同时保留了完整的精度性能。使用我们的方法,可以加载175b参数16/32位检查点,转换为INT8,并立即使用而不会降解。通过理解和围绕变压器语言模型中高度系统的新兴特征的属性来理解和工作,这些属性主导着注意力和变压器预测性能。为了应对这些功能,我们开发了两部分量化程序,llm.int8()。我们首先将矢量量化与矩阵乘法中每个内部产品的单独归一化常数一起使用,以量化大多数特征。但是,对于新兴的离群值,我们还包括一种新的混合精液分解方案,该方案将离群特征尺寸分离为16位矩阵乘法,而在8位中仍超过99.9%的值乘以99.9%。使用llm.int8(),我们从经验上显示,可以在LLM中执行最多175B参数的推断,而无需任何性能降解。这个结果使此类模型更容易访问,例如,可以在带有消费者GPU的单个服务器上使用Opt-175b/Bloom。
translated by 谷歌翻译
在城市或拥挤的环境中,人类依赖于目光接触,以便与附近的人快速高效地沟通。自主代理还需要检测眼睛接触以与行人进行互动,并安全地浏览它们。在本文中,我们专注于野外的目光接触检测,即自动车辆的现实世界情景,无控制环境或行人的距离。我们介绍了一种模型,利用语义关键点来检测眼睛接触,并表明该高级表示(i)在公开的数据集JAAD上实现最先进的结果,并且(ii)传达比利用更好的泛化性质在端到端网络中的原始图像。为了研究域改性,我们创建了外观:野外的眼睛接触检测的大规模数据集,专注于实际概括的多样化和不受约束的情景。源代码和外观数据集公开分享开放的科学任务。
translated by 谷歌翻译
Video capture is the most extensively utilized human perception source due to its intuitively understandable nature. A desired video capture often requires multiple environmental conditions such as ample ambient-light, unobstructed space, and proper camera angle. In contrast, wireless measurements are more ubiquitous and have fewer environmental constraints. In this paper, we propose CSI2Video, a novel cross-modal method that leverages only WiFi signals from commercial devices and a source of human identity information to recover fine-grained surveillance video in a real-time manner. Specifically, two tailored deep neural networks are designed to conduct cross-modal mapping and video generation tasks respectively. We make use of an auto-encoder-based structure to extract pose features from WiFi frames. Afterward, both extracted pose features and identity information are merged to generate synthetic surveillance video. Our solution generates realistic surveillance videos without any expensive wireless equipment and has ubiquitous, cheap, and real-time characteristics.
translated by 谷歌翻译
Brain-inspired computing proposes a set of algorithmic principles that hold promise for advancing artificial intelligence. They endow systems with self learning capabilities, efficient energy usage, and high storage capacity. A core concept that lies at the heart of brain computation is sequence learning and prediction. This form of computation is essential for almost all our daily tasks such as movement generation, perception, and language. Understanding how the brain performs such a computation is not only important to advance neuroscience but also to pave the way to new technological brain-inspired applications. A previously developed spiking neural network implementation of sequence prediction and recall learns complex, high-order sequences in an unsupervised manner by local, biologically inspired plasticity rules. An emerging type of hardware that holds promise for efficiently running this type of algorithm is neuromorphic hardware. It emulates the way the brain processes information and maps neurons and synapses directly into a physical substrate. Memristive devices have been identified as potential synaptic elements in neuromorphic hardware. In particular, redox-induced resistive random access memories (ReRAM) devices stand out at many aspects. They permit scalability, are energy efficient and fast, and can implement biological plasticity rules. In this work, we study the feasibility of using ReRAM devices as a replacement of the biological synapses in the sequence learning model. We implement and simulate the model including the ReRAM plasticity using the neural simulator NEST. We investigate the effect of different device properties on the performance characteristics of the sequence learning model, and demonstrate resilience with respect to different on-off ratios, conductance resolutions, device variability, and synaptic failure.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
为了使用各种类型的数据理解现实世界,人工智能(AI)是当今最常用的技术。在分析数据中找到模式的同时表示主要任务。这是通过提取代表性特征步骤来执行的,该步骤是使用统计算法或使用某些特定过滤器进行的。但是,从大规模数据中选择有用的功能代表了至关重要的挑战。现在,随着卷积神经网络(CNN)的发展,功能提取操作变得更加自动和更容易。 CNN允许处理大规模的数据,并涵盖特定任务的不同方案。对于计算机视觉任务,卷积网络也用于为深度学习模型的其他部分提取功能。选择合适的网络用于特征提取或DL模型的其他部分不是随机工作。因此,这种模型的实现可能与目标任务以及其计算复杂性有关。已经提出了许多网络,并成为任何AI任务中任何DL模型的著名网络。这些网络被利用用于特征提取或在任何名为骨架的DL模型的开头。骨干是以前在许多其他任务中训练并证明其有效性的已知网络。在本文中,现有骨干的概述,例如详细说明给出了VGG,Resnets,Densenet等。此外,通过对所使用的骨干进行审查,讨论了几个计算机视觉任务。此外,还基于每个任务的骨干,还提供了性能的比较。
translated by 谷歌翻译
最近的视听导航工作是无噪音音频环境中的单一静态声音,并努力推广到闻名声音。我们介绍了一种新型动态视听导航基准测试,其中一个体现的AI代理必须在存在分散的人和嘈杂的声音存在下在未映射的环境中捕获移动声源。我们提出了一种依赖于多模态架构的端到端增强学习方法,该方法依赖于融合来自双耳音频信号和空间占用映射的空间视听信息,以编码为我们的新的稳健导航策略进行编码所需的功能复杂的任务设置。我们展示了我们的方法优于当前的最先进状态,以更好地推广到闻名声音以及对嘈杂的3D扫描现实世界数据集副本和TASTPORT3D上的嘈杂情景更好地对嘈杂的情景进行了更好的稳健性,以实现静态和动态的视听导航基准。我们的小型基准将在http://dav-nav.cs.uni-freiburg.de提供。
translated by 谷歌翻译
时间序列数据收集之间的因果发现可以帮助诊断症状的原因,并希望在发生故障之前防止故障。然而,可靠的因果发现可能非常具有挑战性,特别是当数据采集率变化(即,不均匀的数据采样)时,或在存在丢失的数据点(例如,稀疏数据采样)时。为了解决这些问题,我们提出了一个由两个部分组成的新系统,第一部分填充了具有高斯进程回归的缺失数据,第二部分利用了回声状态网络,即储层计算机(即,用于混沌的类型系统建模)对于因果发现。我们评估我们提出的系统对其他三个现成的因果发现算法的性能,即结构期望 - 最大化,子采样的线性自动回归绝对系数,以及使用田纳西州伊斯曼的传染媒介自回归的多变量格兰杰因果关系化学数据集;我们报告了它们对应的Matthews相关系数(MCC)和接收器操作特征曲线(ROC),并表明所提出的系统优于现有的算法,展示了我们在缺失条目中发现复杂系统中的因果关系的可行性。
translated by 谷歌翻译